Skip to content

Fix Flaky results on Library and Gateway controller test#2039

Merged
istio-testing merged 3 commits into
istio-ecosystem:mainfrom
fjglira:fix-flay-e2e
Jun 29, 2026
Merged

Fix Flaky results on Library and Gateway controller test#2039
istio-testing merged 3 commits into
istio-ecosystem:mainfrom
fjglira:fix-flay-e2e

Conversation

@fjglira

@fjglira fjglira commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

We are getting flaky results on this test and adding Eventually wrapper can help us to avoid this situations

What type of PR is this?

  • Enhancement / New Feature
  • Bug Fix
  • Refactor
  • Optimization
  • Test
  • Documentation Update

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #2038

Related Issue/PR #

Additional information:

We are getting flaky results on this test and adding Eventually wrapper can help us to avoid this situations

Signed-off-by: Francisco Herrera <fjglira@gmail.com>
@fjglira fjglira requested a review from a team as a code owner June 26, 2026 16:43
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.57%. Comparing base (5d72f81) to head (087f2a9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2039   +/-   ##
=======================================
  Coverage   76.57%   76.57%           
=======================================
  Files          58       58           
  Lines        3181     3181           
=======================================
  Hits         2436     2436           
  Misses        608      608           
  Partials      137      137           
Flag Coverage Δ
integration-tests 70.28% <ø> (ø)
unit-tests 52.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

g.Expect(ready).To(BeTrue(), "curl-client pod is not ready")
}).Should(Succeed())
Success("All pods in gateway namespace are ready")
Success("Curl client pod is ready")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check if this work, because now we are only validating that curl pod is running, but the previous step validates that all the gateway deploy are Ready

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses flaky E2E failures in the Sail Operator test suite by making readiness/drift assertions more resilient to eventual consistency and background reconciliation, specifically in the Library reconciliation and Gateway controller E2E tests.

Changes:

  • Wrap the ValidatingWebhookConfiguration drift mutation in an Eventually block to tolerate concurrent reconciler updates.
  • Narrow the Gateway controller “pod ready” check to the specific curl-client pod rather than requiring all pods in the gateway namespace to be ready.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tests/e2e/library/library_reconcile_test.go Retries webhook drift mutation to reduce flakes from concurrent reconciliation.
tests/e2e/gatewaycontroller/gateway_controller_test.go Makes the curl-client readiness check more targeted to avoid unrelated pod readiness causing flakes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +279 to +285
// The background reconciler may update the webhook concurrently; retry on conflict.
Eventually(func(g Gomega) {
webhook := &admissionv1.ValidatingWebhookConfiguration{}
g.Expect(cl.Get(ctx, webhookKey, webhook)).To(Succeed())
webhook.Webhooks[0].Name = "xyz.xyz.xyz"
webhook.Webhooks[0].FailurePolicy = ptr.Of(admissionv1.Fail)
g.Expect(cl.Update(ctx, webhook)).To(Succeed())
@MaxBab

MaxBab commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

@fjglira

Looks ok to me.
I've added hold label in case the PR would not get merged after my approval, in case you would like to make more changes.
If, not just remove the hold label.

Signed-off-by: Francisco Herrera <fjglira@gmail.com>
@fjglira

fjglira commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

I had to do the change manually

@fjglira

fjglira commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Seems that is an extra flaky test now:

Summarizing 1 Failure:
  [FAIL] Control Plane updates In-Place Updates Updating from  to  when workloads are deployed in sidecar mode [It] should have connectivity with old version [control-plane, update, slow, sidecar]
  /home/prow/go/src/github.com/istio-ecosystem/sail-operator/tests/e2e/util/common/e2e_utils.go:572
Ran 75 of 86 Specs in 445.211 seconds
FAIL! -- 74 Passed | 1 Failed | 0 Pending | 11 Skipped
--- FAIL: TestControlPlane (445.22s)
FAIL

Checkingto see if I can fix this in the same PR

Signed-off-by: Francisco Herrera <fjglira@gmail.com>
@fjglira

fjglira commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@MaxBab it failed before as my last messages shown, I pushed a new commit with a new validate connectivity logic to avoid this kind of issues

@istio-testing istio-testing merged commit 86b79a9 into istio-ecosystem:main Jun 29, 2026
17 checks passed
openshift-service-mesh-bot pushed a commit to openshift-service-mesh-bot/sail-operator that referenced this pull request Jun 30, 2026
* upstream/main:
  Fix Flaky results on Library and Gateway controller test (istio-ecosystem#2039)
  docs: fix inaccurate "ambient mode only supports InPlace" update-strategy note (istio-ecosystem#2005)
  test(e2e): write debug info to artifacts only, print when E2E_PRINT_DEBUG=true (istio-ecosystem#2031)
  fix(e2e): initialize controller-runtime logger in ambient suite (istio-ecosystem#2030)
  test: Fix PodSecurity error on gateway test while running on OCP clusters (istio-ecosystem#2025)
  test: organization of labels used on e2e test (istio-ecosystem#2026)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2024)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2020)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2013)
  fix(install): align library drift watch with Helm label (istio-ecosystem#2017)
  Sail library crds fixes (istio-ecosystem#2014)
  test(revision): fix flaky TestPruneInactive requeueAfter assertion (istio-ecosystem#2015)
  Add E2E migration test (istio-ecosystem#1974)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2010)
  Automate FBC upgrade graph flow during release workflow (istio-ecosystem#1994)
  Improve tests logs gathering (istio-ecosystem#2007)
  Fix race condition in `ToDiscoveryClient` (istio-ecosystem#2003)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2006)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2000)
openshift-service-mesh-bot pushed a commit to openshift-service-mesh-bot/sail-operator that referenced this pull request Jul 1, 2026
* upstream/main:
  Fix update-deps job (istio-ecosystem#2047)
  Fix Flaky results on Library and Gateway controller test (istio-ecosystem#2039)
  docs: fix inaccurate "ambient mode only supports InPlace" update-strategy note (istio-ecosystem#2005)
  test(e2e): write debug info to artifacts only, print when E2E_PRINT_DEBUG=true (istio-ecosystem#2031)
  fix(e2e): initialize controller-runtime logger in ambient suite (istio-ecosystem#2030)
  test: Fix PodSecurity error on gateway test while running on OCP clusters (istio-ecosystem#2025)
  test: organization of labels used on e2e test (istio-ecosystem#2026)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2024)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2020)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2013)
  fix(install): align library drift watch with Helm label (istio-ecosystem#2017)
  Sail library crds fixes (istio-ecosystem#2014)
  test(revision): fix flaky TestPruneInactive requeueAfter assertion (istio-ecosystem#2015)
  Add E2E migration test (istio-ecosystem#1974)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2010)
  Automate FBC upgrade graph flow during release workflow (istio-ecosystem#1994)
  Improve tests logs gathering (istio-ecosystem#2007)
  Fix race condition in `ToDiscoveryClient` (istio-ecosystem#2003)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2006)
  Automator: Update dependencies in istio-ecosystem/sail-operator@main (istio-ecosystem#2000)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix TestLibrary and Gateway test flaky results

4 participants